Chau Tran
2023-11-06
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
As we can see, the Date column is in the format Month Day, Year
Because I will analyze on separate month, separate day, and separate year, I will extract the years from the Data column.
First, I check the data type of the ‘date’ column. The ‘date’ column is in character.
I convert Date from character to class ‘date’. I extract year from the ‘date’ column
I used the same process to extract day from the ‘date’ column.
I use the same process to extract month from the ‘date’ column.
The missing values in the time column accounts for 30% of the data in the column.
The missing values in the flight column accounts for 74% of the data in the column.
The missing values in the registration column accounts for 5% of the data in the column.
The missing values in the cn/ln column accounts for 13% of the data in the column.
The time column is not doable in my ability
I decide to drop those 5 columns.
The ‘location’ column is in the format city/region/country. I decide to split the values in the ‘location’ column so I can perform analysis on individual areas.
First, I use the str_split_fixed() function to split the column. For example: Victoria, British, Canada becomes |Victoria| and |British, Canada|.
Second, I use the gsub() function to erase the part before the comma in the second column that I just split. For example: British, Canada becomes Canada. Finally, I add that column to my data set and name it Country.
When I look at the number of fatalities and the number of passenger aboard, I think that I could obtain the number of survivals by subtracting the number of fatalities from number of passengers aboard. Therefore, I subtract the values in the Fatalities column from the values in the Aboard column and then add those new values to my data set and name it Survival.
I like to get the ratio of the crew fatalities by crew fatalities/ crew aboard
I like to get the ratio of the passenger fatalities by passenger fatalities/ passenger aboard
As you know, in Texas, traffic accidents are something we witness almost every week or every day with our own eyes. However, it is not easy to witness aviation accidents and learn about them. For such a large block of metal to fly into the sky, humans must have put in a lot of effort. Therefore, we are curious what kind of impact and involvement people might have when something that big comes down.
DATA DESCRIPTION
Date: Date of accident, in the format - January 01, 2001
Time: Local time, in 24 hr. format unless otherwise specified
Operator: Airline or operator of the aircraft
Flight #: Flight number assigned by the aircraft operator
Route: Complete or partial route flown prior to the accident
AC Type: Aircraft type
Reg: ICAO registration of the aircraft
cn / ln: Construction or serial number / Line or fuselage number
Aboard: Total aboard (passengers / crew)
Passengers aboard : Passengers abroad
Crew aboard : Crew abroad
All fatalities : Total fatalities aboard (passengers / crew)
Passenger fatalities: Total Passenger fatalities
Crew fatalities: Total Crew fatalities
Ground: Total killed on the ground
Summary: Brief description of the accident and cause if known
Domain question:
How has our chance to survive an airplane crash or not get into a crash evolved over 113 years (1908-2021) ?
Other questions:
What are various factors contributing to airplane crashes over 113 years?
How large is our chance to survive depending on our roles on the plane ?
How much damage will be caused when an airplane crash and to what extent ? (number of death )
Is there a safe month or day to fly ?
What type of airplane is the most dangerous to fly ?
What operator is the most dangerous to fly ?
What are the most dangerous countries to fly ?
What is the pattern of number of passenger on board and the number of fatalities over the years ?
How does the chance of survival look like over the years ?
## [1] "NULL"
Let’s zoom in year 1999 to see why the number of survivors was greater than the number of fatalities
What pattern of damage does an airplane usually cause ?
Let’s zoom in another perspective to see the pattern of the way an airplane crashes cost lives better!
What time periods have the most crash ?
What operators have the most crashes ?
What type of airplane has the most crashes ?
How does the number of fatalities on board and the number of fatalities on the ground look like over the years ?
What month has the most crashes ?
Does the chance of getting into an airplane crash vary on different days ?
What countries have the most airplane crashes ?
Does the chance of survival vary on different roles on the airplane ?
Most of the time, when a plane crashed, passengers on board either mostly died or mostly survived. The survival rate of the passengers on board in 1908-2021 has been relatively low. However, deaths on the ground caused by plane crashes have almost never occurred. But once that happens, the plane crash had the potential to kill thousands of people on the ground like the 9/11 event. We also know that the three most dangerous countries to fly in are Russia, the United States, and Brazil. However, this might be suspected because those countries might have high air traffic so we cannot not conclude yet. In terms of the time of year to fly, the chances of being involved in a plane crash are fairly similar across all months and days of the year. On top of that, regardless of whether a person on a flight is a captain or a passenger, the chances of survival remain the same. The time with the most plane crashes is during the war and the type of plane involved in the most crashes (Douglas DC - 3) is the type used in the war. We have not found any commercial type of airplane that has an outstanding high number of crashes. In short, other than the time of war and the points we have not yet verified, air travel looks quite safe after 1999 to the present.